home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Amiga Tools 2
/
Amiga Tools 2.iso
/
tools
/
packer
/
lha
/
lhx
/
lha150r.lzh
/
AppInfo
next >
Wrap
Text File
|
1993-01-30
|
11KB
|
364 lines
LhA V1.32 Application Info
File structure and
algorithms.
By Stefan Boberg 1991,92
NB: This is an early version of the document, so it is not complete.
Format of a LZH / LHA file
--------------------------
LHA files have exactly the same file format and structure as LZH files,
but LHA files generally are compressed with -lh5- compression, while LZH
files generally are -lh1- compressed (see section about compression algo-
rithms).
Files can be stored in arbitrary order in the archive file.
The overall file format is as follows:
[file header]
[file data]
[file header]
[file data]
.
.
.
[archive terminator]
The file header is layout as follows:
Case 1: (header level 0)
Header size (in bytes) 1 byte
Header checksum 1 byte
Storage method 5 bytes
Compressed size 4 bytes
Original size 4 bytes
Last mod file date & time 4 bytes
File attributes 1 byte
Header level [0] 1 byte
Filename length 1 byte
Filename & filenote variable size
File CRC-16 2 bytes
Case 2: (header level 1)
Header size (in bytes) 1 byte
Header checksum 1 byte
Storage method 5 bytes
Compressed size 4 bytes
Original size 4 bytes
Last mod file date & time 4 bytes
File attributes 1 byte
Header level [1] 1 byte
Filename length 1 byte
Filename & filenote variable size
File CRC-16 2 bytes
Host Operating System 1 byte
Extension size 2 bytes
Extension data variable size
...
Extension terminator [0] 2 bytes
Case 3: (header level 2)
Header size (in bytes) 2 bytes
Storage method 5 bytes
Compressed size 4 bytes
Original size 4 bytes
Last mod file date & time (UNIX-Fmt) 4 bytes
File attributes 1 byte
Header level [1] 1 byte
Filename length 1 byte
Filename & filenote variable size
File CRC-16 2 bytes
Host Operating System 1 byte
Extension size 2 bytes
Extension data variable size
...
Extension terminator [0] 2 bytes
The compressed file data follows immediately after the last header byte.
The archive terminator is a single 0 byte after the last data byte of the
last file in the archive.
Explanation of fields
---------------------
All fields are encoded in Intel-format, i.e. 16-bit quantities are stored
with the least significant byte first. 32-bit quantities are stored as two
16-bit Intel words with the least significant word first.
Header size
This unsigned byte contains the length of the header excluding the
header checksum byte and the header size byte itself.
With level-1 headers, the extended headers are NOT included in the
header size count. (except for the first two-byte length word).
With level-2 headers, this is a two-byte word field containing the
length of the entire header including all extended headers.
Header checksum
This byte contains the modulo-256 checksum of the header, which is
calculated as follows (pseudo-C):
{
unsigned byte header[];
unsigned byte length;
unsigned byte checksum;
checksum = 0;
length = header[0]; /* Header size field */
while (length) {
checksum += header[length + 2];
length--;
}
/* checksum now contains the checksum */
}
Storage method
This is a 5-byte ASCII char array containing the storage method ID.
See the section about compression methods for a list of IDs.
Compressed size
Original size
These 4-byte fields contains the size of the file in it's compressed
and original state, respectively.
Last file modification date & time
The date and time is encoded in standard MS-DOS format. The 32-bit word
is divided into bit fields like this:
Bit 31 - 25 (Year - 1980)
21 - 24 Month [1..12]
16 - 20 Day [1..31]
11 - 15 Hour [0..23]
5 - 10 Minute [0..59]
0 - 4 Seconds/2 [0..29]
With level-2 headers, things are a bit different. In this case the date
is stored in UNIX-format. A UNIX timestamp is a 32-bit integer containing
the number of seconds since January 1, 1970.
File attributes
This byte field contains the file attribute bits, the format depends
on the host operating system.
Header level
This byte field is used to indicate what kind of header this is, it
can currently be 0 (original LhArc format), 1 or 2 (Unix LHarc/LHA
format).
Filename length
This field contains the length (in bytes) of the filename.
Amiga LhArc/LhA stores filenotes in level-0 headers in the filename
field. The filenote follows the null-terminated filename (the filename
is not normally null-terminated). The length of the filenote and the
null byte should be included in the filename length count. This way
of storing the filenotes is compatible with all versions of LhArc, so
Amiga LZH archives with filenotes can be processed on other platforms
without problems.
Filename & filenote
This field contains the filename and (optional) filenote.
File CRC-16
This field contains the CRC-16 of the source (uncompressed) file.
It is used to check the integrity of the archive during extract and
test operations.
CRC
---
The CRC is a standard ANSI 16-bit CRC. It is calculated as follows:
(Pseudo-C)
unsigned short calcCRC(unsigned char *buffer, unsigned int length)
{
unsigned short crc;
unsigned int i;
unsigned char c;
crc = 0;
i = 0;
while(i < length;) {
c = buffer[i++];
crc = crctable[(crc ^ (c)) & 0xFF] ^ (crc >> 8);
}
return(crc);
}
The CRC-table is built as follows:
unsigned short crctable[256];
void make_crctable(void)
{
unsigned int i, j, r;
for (i = 0; i < 256; i++) {
r = i;
for (j = 0; j < 8; j++)
if (r & 1) r = (r >> 1) ^ 0xA001;
else r >>= 1;
crctable[i] = r;
}
}
Extended headers
-----------------
The `extended headers' are used in level-1 and level-2 headers to store
optional or variably-sized information such as filenotes, operating-system
specific attributes etc. The general structure of an extended header is:
Length [2 bytes] (The length count includes the type,
Type [1 byte] length and data fields, i.e. data
Data [Length - 3 bytes] field length + 3 = Length)
The extended-headers block is terminated by 2 zero bytes (zero length).
The currently implemented headers are:
Type
----
0 Common header (Data = Header CRC16)
1 Filename header (Data = ASCII string of Filename, excluding
directory names)
2 Dirname header (Data = ASCII string of Directory name,
excluding trailing slash). Node delimiter
is 0xFF (octal 0377, decimal 255)
0x40 Attribute header (Data = Two-byte word containing file
attributes). This overrides the attribute
field in the main header.
0x71 Filenote header (Data = ASCII string of filenote)
Compression modes
-----------------
Currently, a file can be stored in the archive in one of four ways; it
can be STORED (not compressed) or FROZEN (compressed) in three different
ways. The method ID's are listed in the table below:
Method ID
------------------
Stored -lh0-
Frozen -lh1-
Frozen -lh4-
Frozen -lh5-
Directory -lhd-
------------------
I. STORED
Compression:
A stored file is not compressed. The file data should be copied
directly from the source file to the archive, the CRC16 for the file
must be calculated and stored in the header for data integrity check.
Decompression:
A stored file is not compressed. The file data can be copied
directly from the archive to the destination, while calculating the
CRC16 for the file.
II. FROZEN (-lh1-)
Compression:
LZ77 with 4096 bytes window. Literals and copies encoded with
dynamic order-0 Huffman codes. Distance codes encoded with fixed
order-0 Huffman codes.
[ No algorithm description in this early document version ]
Decompression:
[ No algorithm description in this early document version ]
III. FROZEN (-lh4-)
Compression:
This method is exactly the same as -lh5-, but with a window size
of 4096 characters. See the description of -lh5- for more info.
Decompression:
This method is exactly the same as -lh5- and can be decompressed
with the same decompression routine, there is no difference between
-lh5- and -lh4- from the decompressor's point of view.
IV. FROZEN (-lh5-)
Compression:
LZ77 with 8192 bytes window. Literals and copies encoded with
block-adaptive order-0 Huffman codes. Number of distance bits encoded
with another set of block-adaptive Huffman codes.
[ No algorithm description in this early document version ]
Decompression:
No buffer initialization required.
[ No algorithm description in this early document version ]
V. Directory (-lhd-)
Compression:
No Compression. Set the CRC-16 field to 0000. The directory name
should include a trailing slash. (like in `dir1/dir2/', and not
`dir1/dir2')
Decompression:
No Compression. Just create the directory whose name is in the
filename field.